关于RNN的几点理解
0. LSTM
LSTM cell 单元 的输入
X=x1,x2,...xn, X.shape=(counts,n,...)
编号 | 符号 | 描述 | shape | tf/keras |
---|---|---|---|---|
1. | $C_ | i−1 cell 的单元状态,记忆细胞 (Memory Cell) | 初始化, | |
2. | $H_ | i−1 cell 的输出值,输出隐含层(Hidden Layer) | shape=(counts,1,units) | |
3. | $x_ | i cell 的输入量 | xi.shape=(counts,1,...) | 数据输入 |
LSTM cell 单元 的输出
编号 | 符号 | 描述 | shape | tf/keras |
---|---|---|---|---|
1. | $C_ | i cell 的单元状态,记忆细胞 (Memory Cell) | 初始化, | |
2. | $H_ | i cell 的输出值,输出隐含层(Hidden Layer) | shape=(counts,1,units) |
关于门
遗忘门的功能是决定应丢弃或保留哪些信息
输入门(更新门) 用于更新细胞状态。
输出门 用来确定下一个隐藏状态的值
gate 门 | 范围 |
---|---|
$\Gamma_f^ | (0,1) |
$\Gamma_i^ | (0,1) |
$\Gamma_o^ | (0,1) |
- 遗忘门 Forget gate
Γ⟨t⟩f=σ(Wf[h⟨t−1⟩,x⟨t⟩]+bf)
bf :遗忘门的偏执,在tensorflow 中 体现为:tf.nn.rnn_cell.BasicLSTMCell(num_units,forget_bias=1.0)
h⟨t−1⟩: 上一个cell 的 hidden state
Wf: 权重 recurrent_initializer对应的权重
x⟨t⟩: 当下输入
参数 num_units=128 的话,对于公式 (1)
参数 | 维度 |
---|---|
$h^ | shape=(128,1) |
$x^ | shape=(28,1) |
$[h^ | shape=(28,1)+(128,1)=(156,1) |
Wf | shape=(128,156) |
bf | shape=(128,1) |
由于激活函数的存在 0<=Γ⟨t⟩f<=1
如果 Γ⟨t⟩f==0 意味着 当前 LSTM cell 将忽略 上一个cell 传来的 h⟨t−1⟩
如果 Γ⟨t⟩f==1 意味着 当前 LSTM cell 将保存 上一个cell 传来的 h⟨t−1⟩
- 更新门/输入门 Update gate/Inuput gat
Γ⟨t⟩u=σ(Wu[h⟨t−1⟩,xt]+bu)
Similar to the forget gate, here Γ⟨t⟩u is again a vector of values between 0 and 1. This will be multiplied element-wise with ˜c⟨t⟩, in order to compute c⟨t⟩.
- 更新细胞 Updating the cell (基于 更新门和遗忘门)
To update the new subject we need to create a new vector of numbers that we can add to our previous cell state. The equation we use is:
˜c⟨t⟩=tanh(Wc[h⟨t−1⟩,x⟨t⟩]+bc)
Finally, the new cell state is:
$$ c^{\langle t \rangle} = \Gamma_f^{\langle t \rangle} c^{\langle t-1 \rangle} + \Gamma_u^{\langle t \rangle} \tilde{c}^{\langle t \rangle} \tag{4} $$
tanh 对应
- 输出门 Output gate
To decide which outputs we will use, we will use the following two formulas:
Γ⟨t⟩o=σ(Wo[h⟨t−1⟩,x⟨t⟩]+bo)
h⟨t⟩=Γ⟨t⟩o∗tanh(c⟨t⟩)
LSTM 变体
1. LSTMCell 实例
Keras LSTM 参数解析
tf.keras.layers.LSTM tf.layers.Layer |_tf.keras.layers.Layer |_tf.keras.layers.RNN |_tf.keras.layers.LSTM __init__( units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, **kwargs ) __call__( inputs, initial_state=None, # a tensor or list of tensors representing the initial state of the RNN layer. constants=None, **kwargs )
- return_sequences=False; return_state=False
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {}".format(x_instance.shape[0])) sess.run(tf.global_variables_initializer()) h_t=sess.run(a,feed_dict={x:x_instance}) print ("h_t shape is {}".format(result.shape)) #cell_num is 2 #x input counts/shape[0] is 4 #result shape is (4, 2)
h_t 是 h⟨t⟩ ,h⟨t⟩shape=()
- return_sequences=True; return_state=False
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num, return_sequences=True, return_state=False)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape)) sess.run(tf.global_variables_initializer()) H_all=sess.run(a,feed_dict={x:x_instance}) print ("H_all shape is {}".format(H_all.shape)) # cell_num is 2 # x input counts/shape[0] is 4,x input timestamp/shape[1] is 1,shape is (4, 1, 3) # H_all shape is (4, 1, 2)
H_all 是 np.array(h⟨t⟩,h⟨t−1⟩,h⟨t−2⟩,...,h⟨1⟩)
- return_sequences=False; return_state=True
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num, return_sequences=False, return_state=True)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape)) sess.run(tf.global_variables_initializer()) H_all, h_t, c_t =sess.run(a,feed_dict={x:x_instance})
此时 H_all==h_t
- return_sequences=True; return_state=True
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num, return_sequences=True, return_state=True)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape)) sess.run(tf.global_variables_initializer()) H_all, h_t, c_t =sess.run(a,feed_dict={x:x_instance})
H_all 是 np.array(h⟨t⟩,h⟨t−1⟩,h⟨t−2⟩,...,h⟨1⟩)
经过初步调查,常用的LSTM层有Keras.layers.LSTM 和 Tensorflow.contrib.nn.LSTMCell 及 Tensorflow.nn.rnn_cell.LSTMCell ,其中后面两个的实现逻辑是一样的。
LSTMCell = tf.contrib.rnn.BasicLSTMCell(num_units) tf.nn.rnn_cell.BasicLSTMCell(num_units) tf.keras.layers.RNN( cell, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, **kwargs ) # 推荐
2. 时间 BBTP 反向传播
对于记忆细胞 M
# h_all_1 shape=[counts,time_steps,cell_num_1] # h_all_2 = LSTM(32, return_sequences=True, # return_state=False, dropout=0, recurrent_initializer='orthogonal', kernel_initializer='orthogonal')(h_all_0) # h_all_1 shape=[counts,time_steps,cell_num_2] # h_all_3 = LSTM(8, return_sequences=False, # return_state=False, dropout=0, recurrent_initializer='orthogonal', kernel_initializer='orthogonal')(h_all_2) # h_all_3 shape=[counts,time_steps,cell_num_2] # lay4 = GlobalMaxPool1D()(h_all_3) # lay3 shape =[counts,cell_num_2] # lay4 = tf.layers.dense(lay3, units=20, activation="relu") # output = activation(dot(input, kernel)+bias) # lay4 shape =[counts,units_nums] # print(" lay4 shape is {}".format(lay4.shape))